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Abstract 

The local linear embedding algorithm (LLE) is a non-linear dimension-reducing technique, 
widely used due to its computational simplicity and intuitive approach. LLE first linearly 
reconstructs each input point from its nearest neighbors and then preserves these neighbor- 
hood relations in the low-dimensional embedding. We show that the reconstruction weights 
computed by LLE capture the /li^/i-dimensional structure of the neighborhoods, and not 
the Zow-dimensional manifold structure. Consequently, the weight vectors are highly sensi- 
tive to noise. Moreover, this causes LLE to converge to a linear projection of the input, as 
opposed to its non-linear embedding goal. To overcome both of these problems, we propose 
to compute the weight vectors using a low-dimensional neighborhood representation. We 
prove theoretically that this straightforward and computationally simple modification of 
LLE reduces LLE's sensitivity to noise. This modification also removes the need for regular- 
ization when the number of neighbors is larger than the dimension of the input. We present 
numerical examples demonstrating both the perturbation and linear projection problems, 
and the improved outputs using the low-dimensional neighborhood representation. 

Keywords: Locally Linear Embedding (LLE), dimension reduction , manifold learning, 



1. Introduction 



The local linear embedding algorithm (LLE) (Roweis and Saul 20001 belongs to a class of 



recently developed, non-linear dimension-reducing algorithms that include Isomap (Tenen 



baum et al. 



and Grimes 



2000|), Laplacian Eigenmap ( Belkin and Niyogi , 2003 1 , Hessian Eigenmap ( Dono lo 



2004), LTSA (Zhang and Zha 2004 1, and MVU (Weinberger and Saul 2006 1 . 



This group of algorithms assumes that the data is sitting on, or next to, an embedded 
manifold of low dimension within the original high- dimensional space, and attempts to find 
an embedding that maps the input points to the lower-dimensional space. Here a manifold 
is defined as a topological space that is locally equivalent to an Euclidean space. LLE was 
found to be useful in data visualization (Roweis and Saul, 2000; |Xu et al.| |2008|) and in 



image processing applications, such as image denoising (Shi et al 



20051 and human face 



detection (Chen et al. , 2007). It is also applied in different fields of science such as chem- 
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istry (iL'Heureux et al.l 120041), biology (I Wang et al.l |2005|), and astrophysics (IXu et al. 



20061. 



LLE attempts to recover the domain structure of the input data set in three steps. First, 
LLE assigns neighbors to each input point. Second, for each input point LLE computes 
weight vectors that best Hnearly reconstruct the input point from its neighbors. Finahy, 
LLE finds a set of low-dimensional output points that minimize the sum of reconstruction 
errors, under some normalization constraints. 

In this paper we focus on the computation of the weight vectors in the second step of 
LLE. We show that LLE's neighborhood description captures the structure of the high- 
dimensional space, and not that of the /ow;-dimensional domain. We show two main con- 
sequences of this observation. First, the weight vectors are highly sensitive to noise. This 
implies that a small perturbation of the input may yield an entirely different embedding. 
Second, we show that LLE converges to a linear projection of the high-dimensional input 
when the number of input points tends to infinity. Numerical results that demonstrate our 
claims are provided. 

To overcome these problems, we suggest a simple modification to the second step of 
LLE, LLE with low- dimensional neighborhood representation. Our approach is based on 
finding the best low- dimensional representation for the neighborhood of each point, and then 
computing the weights with respect to these low-dimensional neighborhoods. This proposed 
modification preserves LLE's principle of reconstructing each point from its neighbors. It is 
of the same computational complexity as LLE and it removes the need to use regularization 
when the number of neighbors is greater than the input dimension. 

We prove that the weights computed by LLE with low-dimensional neighborhood rep- 
resentation are robust against noise. We also prove that when using the modified LLE on 
input points sampled from an isometrically embedded manifold, the pre-image of the input 
points achieves a low value of the objective function. Finally, we demonstrate an improve- 
ment in the output of LLE when using the low-dimensional neighborhood representation 
for several numerical examples. 



There are other works that suggest improvements for LLE. The Efficient LLE (Hadid 



and Pietikainen, 2003) and the Robust LLE (Chang and Yeung 20061 algorithms both 



address the problem of outliers by preprocessing the input data. Other versions of LLE, 
including ISOLLE (Varini et al. 20061 and Improved LLE (Wang et al. , 2006), suggest 
different ways to compute the neighbors of each input point in the first step of LLE. The 
Modified LLE algorithm (Zhang and Wang 2007) proposes to improve LLE by using mul- 
tiple local weight vectors in LLE's second step, thus characterizing the high-dimensional 
neighborhood more accurately. All of these algorithms attempt to characterize the high- 
dimensional neighborhoods, and not the /ow-dimensional neighborhood structure. 

Other algorithms can be considered variants of LLE. Laplacian Eigenmap essentially 
computes the weight vectors using regularization with a large regularization constant (see 
discussion on the relation between LLE and Laplacian Eigenmap in Belkin and Niyogi} 



2003 Section 5). Hessian Eigenmap (Donoho and Grimes 2004) characterizes the local 



input neighborhoods using the null space of the local Hessian operator, and minimizes the 



appropriate function for the embedding. Closely related is the LTSA algorithm (Zhang 



and Zha 2004), which characterizes each local neighborhood using its local PCA. These 



last two algorithms attempt to describe the low-dimensional neighborhood. However, these 
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algorithms, like Laplacian Eigenmap, do not use LLE's intuitive approach of reconstruct- 
ing each point from its neighbors. Our proposed modification provides a low-dimensional 
neighborhood description while preserving LLE's intuitive approach. 

The paper is organized as follows. The description of LLE is presented in Section |2} 
The discussion of the second step of LLE appears in Section [3} The suggested modification 
of LLE is presented in Section |4j Theoretical results regarding LLE with low-dimensional 
neighborhood representation appear in Section |5} In Section [6] we present numerical exam- 
ples. The proofs are presented in the Appendix. 



2. Description of LLE 

The input data X = {xi, . . . , xat}, Xi G for LLE is assumed to be sitting on or next to 
a (i-dimensional manifold ^A. We refer to X a.s an N x D matrix, where each row stands 
for an input point. The goal of LLE is to recover the underlying d-dimensional structure of 
the input data X. LLE attempts to do so in three steps. 

First, LLE assigns neighbors to each input point Xj. This can be done, for example, 
by choosing the input point's ii'-nearest neighbors based on the Euclidian distances in the 
high-dimensional space. Denote by {rjj} the neighbors of Xj. Let the neighborhood matrix 
of Xi be denoted by Xi, where Xi is the K x D matrix with rows 7]j — Xi. 

Second, LLE computes weights Wij that best linearly reconstruct Xi from its neighbors. 
These weights minimize the reconstruction error function 

ipi{Wi) = \\xi -'^WijXjW'^ , (1) 

j 

where Wij = if xj is not a neighbor of Xi, and Wij = 1. With some abuse of notation, 
we will also refer to Wi as a K x 1 vector, where we omit the entries of Wi for non-neighbor 
points. Using this notation, we may write (pi{wi) = w^XiX^Wi. 

Finally, given the weights found above, LLE finds a set of low- dimensional output points 
Y = {yi, . . . , y^v} £ I^*^ that minimize the sum of reconstruction errors 

n 

HY)=Y.\\yi-Y.w,,yjf, (2) 

i=l j 

under the normalization constraints Y'l. = and Y'Y = /, where 1 is vector of ones. These 
constraints force a unique minimum of the function $. 

The function ^(Y) can be minimized by finding the d-bottom non-zero eigenvectors 
of the sparse matrix (/ — W)'{I — W), where W is the matrix of weights. Note that 
the p-th coordinate {p = 1, . . . found simultaneously for all output points y^, is equal 
to the eigenvector with the p-smallest non-zero eigenvalue. This means that the first p 
coordinates of the LLE solution in q dimensions, p < q, are exactly the LLE solution in p 



dimensions (Roweis and Saul 2000 Saul and Roweis, 2003). Equivalently, if an LLE output 



of dimension q exists, then a solution for dimension p, p < q, is merely a linear projection 
of the g-dimensional solution on the first p dimensions. 

When the number of neighbors K is greater than the dimension of the input D, each 
data point can be reconstructed perfectly from its neighbors, and the local reconstruction 
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weights are no longer uniquely defined. In this case, regularization is needed and one needs 
to minimize 



(p''j^^{wi) = \\xi - '^WijXjW'^ + 6\\wi\\'^ . 



(3) 



where 5 is a small constant. Saul and Roweis (2003 1 suggested 5 = ^trace(XjXj') with A <C 
1. Regularization can be problematic for the following reasons. When the regularization 
constant is not small enough, it was shown by Zhang and Wang ( |2007 | that the correct 
weight vectors cannot be well approximated by the minimizer of (pl^^{wi). Moreover, when 
the regularization constant is relatively high, it produces weight vectors that tend towards 
the uniform vectors Wi = {1/K, . . . , 1/K). Consequently, the solution for LLE with large 
regularization constant is close to that of Laplacian Eigenmap, and does not reflect a solution 



based on reconstruction weight vectors (see Belkin and Niyogi 2003 Section 5). In addition. 



Lee and Verleysen (20071 demonstrated that the regularization parameter must be tuned 



carefully, since LLE can yield completely different embeddings for different values of this 
parameter. However, in real-world data the dimension of the input is typically greater than 
the number of neighbors. Hence, for real-world data, regularization is usually unnecessary. 



3. Preservation of high-dimensional neighborhood structure by LLE 

In this section we focus on the computation of the weight vectors, which is performed in the 
second step of LLE. We first show that LLE characterizes the /lig/i-dimensional structure 
of the neighborhood. We explain how this can lead to the failure of LLE in finding a 
meaningful embedding of the input. Two additional consequences of preservation of the 
high-dimensional neighborhood structure are discussed. First, LLE's weight vectors are 
sensitive to noise. Second, LLE's output tends toward a linear projection of the input data 
when the number of input points tends to infinity. These claims are demonstrated using 
numerical examples. 

We begin by showing that LLE preserves the high-dimensional neighborhood structure. 
We use the example that appears in Fig [T] The input is a sample from an open ring 
which is a one-dimensional manifold embedded in M?. For each point on the ring, we define 
its neighborhood using its 4 nearest neighbors. Note that its /li^/i-dimensional {D = 2) 
neighborhood structure is curved, while the low-dimensional structure (d = 1) is a straight 
line. The two-dimensional output of LLE (see Fig. [T]) is essentially a reconstruction of the 
input. In other words, LLE's weight vectors preserve the curved shape of each neighborhood. 

The one-dimensional output of the open ring is presented in Fig[Tp. Recall that the one- 
dimensional solution is a linear projection of the two-dimensional solution, as explained in 
section[2| In the open-ring example, LLE clearly fails to find an appropriate one-dimensional 
embedding, because it preserves the two-dimensional curved neighborhood structure. We 
now show that this is also true for additional examples. 

The 'S' curve input data appears in Fig |2]\. Fig [2p shows that the overall three- 
dimensional structure of the 'S' curve is preserved in the three-dimensional embedding. 
The two-dimensional output of LLE appears in Fig[2p. It can be seen that LLE does not 
succeed in finding a meaningful embedding in this case. Fig [3] presents the swissroll, with 
similar results. 
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Figure 1: The input for LLE is the 16-point open ring that appears in (A). The two- 
dimensional output of LLE is given in (B). LLE finds and preserves the two- 
dimensional structure of each of the local neighborhoods. The one-dimensional 
output of LLE appears in (C). The computation was performed using 4- nearest- 
neighbors, and regularization constant A = 10"^. 



We performed LLE, here and in all other examples, using the LLE Matlab code as it 
appears on the LLE website ( Saul and Roweis| P] The code that produced the input data for 
the 'S' curve and the swissroll was also taken from the LLE website. We used the default 
values of 2000-point samples and 12-nearest-neighbors. For the regularization constant we 
used A = 10^^. It should be noted that using a large regularization constant improved the 
results. However, as discussed in Section [2| the weight vectors produced in this way do not 
reflect a solution that is based on reconstruction weight vectors. Instead, the vectors tend 
toward the uniform vector. 

We now discuss the sensitivity of LLE's weight vectors {wi} to noise. Figure |4] shows 
that an arbitrarily small change in the neighborhood can cause a large change in the weight 
vectors. This result can be understood by noting how the vector Wi is obtained. It can be 
shown (Saul and Roweis 20031 that Wi equals {XiXl)~^l, up to normalization. Sensitivity 



to noise is therefore expected when the condition number of XiX'^ is large (see Golub and 
Loan, 1983, Section 2). One way to solve this problem is to enforce regularization, with 
its associated problems (see section [2]) . In the next section we suggest a simple alternative 
solution to the sensitivity of LLE to noise. 

One more implication of the fact that LLE preserves the high-dimensional neighborhood 



structure is that LLE's output tends to a linear projection of the input data. Wu and Hu 



(20061 proved for a finite data set that when the reconstruction errors are exactly zero for 



each of the neighborhoods, and under some dimensionality constraint, the output of LLE 
must be a linear projection of the input data. Here, we present a simple argument that 



1. The changes in the Matlab function eigs were taken into account. 
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Figure 2: (A) LLE's input, a 2000-point 'S' curve. (B) The three-dimensional output of 
LLE. It can be seen that LLE finds the overall three-dimensional structure of the 
input. (C) The two-dimensional output of LLE. 




B 





Figure 3: (A) LLE's input, a 2000-point swissroll. (B) The three-dimensional output of 
LLE. It can be seen that LLE finds the overall three-dimensional structure of the 
input. (C) The two-dimensional output of LLE. 



explains why LLE's output tends to a linear projection when the number of input points 
tends to infinity, and show numerical examples that strengthen this claim. For simplicity, 
we assume that the input data is normalized. 
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Figure 4: The effect of a small perturbation on the weight vector computed by LLE. All 
three panels show the same unperturbed neighborhood, consisting of a point and 
its four nearest-neighbors (black points), all sitting in the two-dimensional plane. 
Each panel shows a different small perturbation of the original neighborhood 
(gray points). All perturbations are in the direction orthogonal to the plane of the 
original neighborhood. (A) and (C): Both perturbations are in the same direction. 
(B) Perturbations are of equal size, in opposite directions. The unique weight 
vector for the center point is denoted for each case. These three different weight 
vectors vary widely, even though the different perturbations can be arbitrarily 
small. 



Our argument is based on two claims. First, note that LLE's output for dimension d is 
a linear-projection of LLE's output for dimension D (see Section [2]). Second, note that by 
definition, the LLE output is a set of points Y that minimizes the sum of reconstruction 
errors ^{Y). For normalized input X of dimension D, when the number of input points 
tends to infinity, each point is well reconstructed by its neighboring points. Therefore the 
reconstruction error ipi{w) tends to zero for each point Xi. This means that the input data 
X tends to minimize the sum of reconstruction errors ^iY). Hence, the output points Y 
of LLE for output of dimension D tend to the input points (up to a rotation). The result 
of these two claims is that any requested solution of dimension d < D tends to a linear 
projection of the D-dimensional solution, i.e., a linear projection of the input data. 

The result that LLE tends to a linear projection is of asymptotical nature. However, 
numerical examples show that this phenomenon can occur even when the number of points 
is relatively small. This is indeed the case for the outputs of LLE shown in Figs. [Tp, |2p, 
and[3p, for the open ring, the 'S' curve, and the swissroll, respectively. 

4. Low-dimensional neighborhood representation for LLE 

In this section we suggest a simple modification of LLE that computes the low-dimensional 
structure of the input points' neighborhoods. Our approach is based on finding the best 
representation of rank d (in the I2 sense) for the neighborhood of each point, and then com- 
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puting the weights with respect to these d-dimensional neighborhoods. In Sections [5] and [6] 
we show theoretical results and numerical examples that justify our suggested modification. 

We begin by finding a rank-d representation for each local neighborhood. Recall that 
Xi is the K X D neighborhood matrix of Xj, whose j-th row is rij — Xi, where r]j is the 
j-th neighbor of Xi. We assume that the number of neighbors K is greater than d, since 
otherwise Xi cannot (in general) be reconstructed by its neighbors. We say that Xf is the 
best rank-d representation of Xi, if Xf minimizes \\Xi — Y\\^ over all the K x D matrices 
Y of rank d. Let ULV' be the SVD of Xi, where U and V are orthogonal matrices of size 
K X K and D x D, respectively, and L is a K x D matrix, where Ljj = Xj are the singular 
values of Xi for j = mm{K,D), ordered from the largest to the lowest, and Lij = for 
i ^ j. We denote 

U={Ui, C/2 ) ; L = ( ^Q^' ^^y,V={ Vi, V2 ) (4) 

where Ui = {ui, . . . , Ud) and Vi = (f i, . . . , Vd) are the first d columns of U and V, respec- 
tively, U2 and V2 are the last K — d and D — d columns of U and V respectively, and Li 
and L2 are of dimension d x d and {K — d) x {D — d), respectively. Then by Corollary 2.3-3 



of |Golub and Loan] ( |l983| ), X[ can be written as UiLiVi- 

We now compute the weight vectors for the d-dimensional neig hborhoodXf . By g, we 
need to find Wi that minimize w'iXf Xf'wi (see Section [2]). The solution for this minimiza- 
tion problem is not unique, since by the construction all the vectors spanned by Ud+i, ■ ■ ■ , uk 
zero this function. Thus, our candidate for the weight vector is the vector in the span of 
Ud+i, • • ■ , Uk that has the smallest I2 norm. In other words, we are looking for 

argmin • (5) 

Wi&pan{ua+i,--.,UK} 
<1=1 

Note that we implicitly assume that 1 ^ spanjui, . . . ,Ud}- This is true whenever the 
neighborhood points are in general position, i.e., no d-\- 1 of them lie in a (d— l)-dimensional 
plane. To understand this, note that if 1 G spanjui, . . . , Ud} then (/ — ^ll')Xf = {I — 
^ll')UiLiVl is of rank d — 1. Since (/ — ^11') Xf is the projected neighborhood after 
centering, we obtained that the dimension of the centered projected neighborhood is of 
dimension d — 1, and not d as assumed, and therefore the points are not in general position. 
See also Assumption (A[2]) in Section [5] and the discussion that follows. 

The following Lemma shows how to compute the vector Wi that minimizes ([5]). 

Lemma 4.1 Assume that the points of Xf are in general position. Then the vector Wi that 
minimizes ^ is given by 

U2U2I 

The proof is based on Lagrange multipliers and appears in Appendix |A.1[ 



Following Lemma 4.1 we propose a simple modification for LLE based on computing 



the reconstruction vectors using d-dimensional neighborhood representation. 
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Algorithm: LLE with low-dimensional neighborhood representation 



Input: X, an N X D matrix. 
Output: Y, an N X d matrix. 

Procedure: 



1. For each point Xj find i^-nearest-neighbors and compute the neighborhood matrix Xi. 

2. For each point Xi compute the weight vector Wi using the d-dimensional neighborhood 
representation: 

• Use the SVD decomposition to write Xi = ULV'. 

• Write U2 = {ud+i ur)- 

• Compute 

U2U2'1 

Wi 



l'U2U2'l ■ 

3. Compute the d-dimension embedding by minimizing ^(Y) (see (|2 



Note that the difference between this algorithm and LLE is in step (2). We compute the 
low- dimensional neighborhood representation of each neighborhood and obtain its weight 
vector, while LLE computes the weight vector for the original high-dimensional neighbor- 
hoods. One consequence of this approach is that the weight vectors Wi are less sensitive 



to perturbation, as shown in Theorem 5.1 Another consequence is that the d-dimensional 
output is no longer a projection of the embedding in dimension q, q > d. This is because 
the weight vectors Wi are computed differently for different values of output dimension d. In 
particular, the input data no longer minimize ^, and therefore the linear projection problem 
does not occur. 

From a computational point of view, the cost of this modification is small. For each 
point Xj, the cost of computing the SVD of the matrix Xi is 0{DK^). For N neighborhoods 
we have 0{NDK^) which is of the same scale as LLE for this step. Since the overall 
computation of LLE is 0{N'^D), the overhead of the modification has little influence on the 



running time of the algorithm (see Saul and Roweis 2003 Section 4). 



5. Theoretical results 

In this section we prove two theoretical results regarding the computation of LLE using the 
low-dimensional neighborhood representation. We first show that a small perturbation of 
the neighborhood has a small effect on the weight vector. Then we show that the set of 
original points in the low-dimensional domain that are the pre-image of the input points 
achieve a low value of the objective function <I>. 

We start with some definitions. Let $7 C M'^ be a compact set and let / : ^ be a 
smooth conformal mapping. This means that the inner products on the tangent bundle at 
each point are preserved up to a scalar c that may change continuously from point to point. 
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Note that the class of isometric embeddings is included in the class of conformal embeddings. 
Let Ai be the d-dimensional image of O in M^. Assume that the input X = {xi, . . . ,xn} 
is a sample taken from M. For each point Xi, define the neighborhood Xi and its low- 
dimensional representation Xf as in Section |4] Let Xi = ULV' and Xf = UiLiVi be the 
SVDs of the i-th neighborhood and its projection, respectively. Denote the singular values 
of Xj by A\ > ... > A}^, where A*- = if D < j < K. Denote the mean vector of the 
projected i-ih neighborhood by m = -^I'Xf. 

For the proofs of the theorems we require that the local high-dimensional neighborhoods 
satisfy the following two assumptions. 

(Al) For each i, X^^^ < Aj^. 

More specifically, it is enough to demand A^^-,^ < min |(A^)^, 

(A2) There is an a < 1 such that for all i, ^I'UiUi'l < a. 

The first assumption states that for each i, the neighborhood Xi is essentially d-dimensional. 
The second assumption was shown to be equivalent to the requirement that points in each 
projected neighborhood are in general position (see discussion in Section [s]). We now show 
that this is equivalent to the requirement that the variance-covariance matrix of the pro- 
jected neighborhood is not degenerate. Denote S = j^Xf Xf = j^ViL\V\ , then 

1,1'UiUi'l = l,l'{UiLiVi'){ViL-^Vi')ViLiUi'l = fi'S~^fi. 

Note that since S—^^' is positive definite, so is I—S~^/'^fifx'S~^^'^. Since the only eigenvalues 
of / — S"^^"^ fifi' S~^^'^ are 1 and 1 — fi'S~^iJ,, we obtain that fi'S^^fj, < 1. 

Theorem 5.1 Let Ei be a K x D matrix such that \\Ei\\p = 1. Let Xi = Xi + eEi he a per- 



turbation of the i-th neighborhood. Assume 



(j^) and (j^ 



2) and e < min 



72 ' 72 



and that A^^ < 1. Let Wi and Wi be the weight vectors for Xi and Xi, respectively, as defined 
by (H. Then 

II II 20e 



See proof in Appendix A. 3, Note that the assumption that A^ < 1 can always be fulfilled 
by rescaling the matrix Xi since rescaling the input matrix X has no influence on the value 

of Wi . 



Fig |4] demonstrates why no bound similar to Theorem 5.1 exists for the weights computed 



by LLE. In the example we see a point on the grid with its 4-nearest neighbors, where some 
noise was added. While Ai~A2~l — and e is arbitrary, the distance between 



each pair of vectors is at least ^. The bound of Theorem 5.1 states that for e = 10 ^,10 ^ 
and 10"*^ the upper bounds on the distance when using the low-dimensional neighborhood 
representation are is 20 • 10^^,20 • 10""^ and 20 • 10"*^ respectively. The empirical results 
shown in Fig [5] are even lower. 

For the second theoretical result we require some additional definitions. 
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500 1000 500 1000 500 1000 

Figure 5: The effect of neighborhood perturbation on the weight vectors of LLE and of LLE 
with low-dimensional neighborhood representation. The original neighborhood 
consists of a point on the two-dimensional grid and its 4-nearest neighbors, as 
in Fig. |4j A 4-dimensional noise matrix eE where H-EHf = 1 was added to the 
neighborhood for e = 10~^, 10^^ and 10"^, with 1000 repetitions for each value 
of e. Note that no regularization is needed since K = D. The graphs show the 
distance between the vector ii'=(|,|,j,3) and the vectors computed by LLE (in 
green) and by LLE with low-dimensional neighborhood representation (in blue). 
Note the log scale in the y axis. 



The minimum radius of curvature ro = ro(A^) is defined as follows: 

— = max I ||7(t) II I 
ro -y^t LM ' WIN 

where 7 varies over all unit-speed geodesies in ^A and t is in a domain of 7. 

The minimum branch separation sq = sq{M) is defined as the largest positive number 
for which Hx— 5;|| < sq implies dM{x, x) < vrro, where x,x £ Ai and d^i^, x) is the geodesic 



distance between x and x (see Bernstein et al. 2000 for both definitions). 



Define the radius r(i) of neighborhood i to be 

r(i) = max II??,— 

where r/j is the j-th neighbor of xi. Finally, define rmax to be the maximum over r(i) . 

We say that the sample is dense with respect to the chosen neighborhoods if rmax < sq. 
Note that this condition depends on the manifold structure, the given sample, and the 
choice of neighborhoods. However, for a given compact manifold, if the distribution that 
produces the sample is supported throughout the entire manifold, then this condition is 
valid with probability increasing towards 1 as the size of the sample is increased and the 
radius of the neighborhoods is decreased. 
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Theorem 5.2 Let ^ be a compact convex set. Let / : J7 ^ M'^ be a smooth conformal 
mapping. Let X be an N-point sample taken from f{^), and let Z = f~^{X), i.e., Zi = 
f~^{xi). Assume that the sample X is dense with respect to the choice of neighborhoods 
and that assumptions (J^and (.^j hold. Then, if the weight vectors are chosen according 
to (§, 

?j^ = maxAj^+iO(rLx) • (7) 



See proof in Appendix A.3 



The theorem states that the original pre-image data Z has a smah value of $ and thus 



is a reasonable embedding, although not necessarily the minimizer (see Goldberg et al. 



20081. This observation is not trivial from two reasons. First, it is not known a-priori 



that {/ ^(r7j)}, the pre-image of the neighbors of rr,, are also neighbors of = / ^(xj). 



When short-circuits occur, this need not to be true (see Balasubramanian et al. , 2002 1. 
Second, the weight vectors {wi} characterized the projected neighborhood, which is only 
an approximation to the true neighborhood. Nevertheless, the theorem shows the Z has a 
low $ value. 

6. Numerical results 

In this section we present empirical results for LLE and LLE with low-dimensional neigh- 
borhood representation on some data sets. For LLE, we used the Matlab code as appears 



in LLE website (Saul and Roweis). The code for LLE with low-dimensional neighborhood 
representation is based on the LLE code and differs only in step (2) of the algorithm and is 
available in JRhomepage . , 

We ran LLE with low-dimensional neighborhood representation on the data sets of 
the open ring, the 'S '-curve and the swissroU that appear in Figs [T]|3j We used the same 
parameters for both LLE and LLE with low-dimensional neighborhood representation {K = 
4 for the open ring and K = 12 for the 'S'-curve and the swissroll). The results appear in 
Fig [6} 

We ran both LLE and LLE with low-dimensional neighborhood representation on 64 
by 64 pixel images of a face, rendered with different poses and lighting directions. The 
698 images and their respective poses and lighting directions can be found at the Isomap 
webpage (Tenenbaum et al. |. The results of LLE, with K = 12, are given in Fig. [7| We 



also checked for K = 8,16; in all cases LLE does not succeed in retrieving the pose and 
lighting directions. The results for LLE with low-dimensional neighborhood representation, 
also with K = 12, appear in Fig [8| The left- right pose and the lighting directions were 
discovered by LLE with low-dimensional neighborhood representation. We also checked for 
K = 8, 16; the results are roughly the same. 
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Figure 6: The inputs appear in the left column. The results of LLE appear in the middle 
column and the the results of LLE with low-dimensional representation appear 
in right column. 



Appendix A. Proofs 
A.l Proof of Lemma 14.11 

Proof Write Wi = J2m=d+i (^mUm = U2a. The Lagrangian of the problem can be written 
as 

L(a, A) = ^a'U2'U2a + Xil'U2a - 1) . 
Taking derivatives with respect to both a and A, we obtain 

dL 

— = U2'U2a-XU2'l = a-XU2'l, 
oa 

dL 

— = l'U2a-l. 
OA 



Hence we obtain that a = tftj^jttt- 

VU2U2 1 



U2U2' 

13 



B 




5? 



c4 



* 



Figure 7: The first two dimensions out of tlie tliree-dimensional output of LLE for the faces 
database appear in all three panels. (A) is colored according to the right-left 
pose, (B) is colored according to the up-down pose, and (C) is colored according 
to the lighting direction. 




Figure 8: The output of LLE with low-dimensional neighborhood representation is colored 
according to the left-right pose. LLE with low-dimensional neighborhood repre- 
sentation also succeeds in finding the lighting direction. The up-down pose is not 
fully recovered. 
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A. 2 Proof of Theorem 15.11 

Proof The proof of Theorem |5.1| consists of two steps. First, we find a representation of 



the vector Wi, the weight vector of the perturbed neighborhood, see (12). Then we bound 
the distance between Wi and Wi, the weight vector of the original neighborhood. 

We start with some notations. For every matrix A, let Xj{A) be the j-th singular value of 
A. Note that ||A||2 = Xi{A). In this notation, we have A*- = Xj{Xi). Denote by T = X/Xi 
and f = X'-Xi = T + eiXiEi + EiXi) + e^Ei'Ei. Using the decomposition of (|4]), we may 
write T = UL^U' and f = UL^U'. Note that Xj{T) = Xj{Xi)^. Define U2 and U2 to be 
the K X — d) matrices of the left-singular vectors corresponding to the lowest singular 
values, as in Q. 

Note that by assumption, Xi{Ei) = 1, hence, Xi{Xi'Ei) < X\ < 1. By Corollary 8.1-3 



of Golub and Loan (19831, 



Xi{T) - 3e < \{T) < Xi{T) + 3e . (8) 



Let 5 = Xd{T) - Xd+i{T) - e. By Theorem 8.1-7 of [Golub and Loan] ([l983|, there is a 



d X (K — d) matrix Q such that ||Q||2 < x such that the columns of U2 = {U2 + 
UiQ){I + Q'Q)~^/'^ are an orthogonal basis for an invariant subspace of T. We want to 
show that U2 and U2 spans the same subspaces. To prove this, we bound the largest singular 
value of ||C/2^C/2||2, and the result follows from ([8|. 
First, note that 

l-y<A,((/ + Q'gr^/^)<l + ^. (9) 

Hence, 

\\U'2fU2\\^ = \\{I + Q'Qy'^''^{U2 + UiQ)'f{U2 + UiQ){I + Q'Q)-^/\ 
1 + ^ j (IIC/jTOjIIj +2||f/2f(7iQ||2+ ||Q'f/;f(7iQ| 

< (l + f)Y(A«(T) + 3.) + <^+(f)'(l+3.)). (10) 



We now obtain some bounds on the size of e. By assumption we have e < 



72 • 



Since assumption (a[i]) holds, we may assume that Xd+i{T) < Recall that 6 

Xd{T)-Xd+i{T)-e and that (Aj^)^ = Xd{T). Isolating e we obtain that s < Similarly, 
we can show that e < g^. We also have e < ? since by assumption Xd{T) < 1, and 
similarly, e < Summarizing, we have 



5 A,(r) Xd{T)5 

e < mm — , , , — (11) 

* 60 72 ' 60 60 ' ^ ' 



We are now ready to bound the expression in (10). We have that (1 -|- ^) < tq since 



e < ^; Ad+i(r) < by assumption; 3e < since e < Hj^; ^ < ^ since 
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60 



Mil. (M! 

72 ' 52 



100 



r ^ (IT) 

e < gQ and e < 'y^ . Combining all these bounds, we obtain 



^ and e < Jj; lisg < ^ since 



72 



l|f^2rf/2||2<^<A,(r)-3e. 

Hence, by ([s]) we have ||C/2T[/2||2 < ^d{T). Since U2 spans a subspace oi K — d dimension, 
it must span the subspace with the K — d vectors with lowest singular values of T. In other 
words, U2 spans the same subspace as U2 or equivalently C/2C/2 = f^2f^2- Summarizing, we 
obtained that 



l'U2U^l 

We are now ready to bound the difference between Wi and Wi 

2 



(12) 



\Wi - Wi\ 



U2U2I _ U2U2^ 
l'U2U2'l l'U2U^l 

1 



^ l'U2U2'U2U2^ I 



l'U2U2'l l'U2U2'll'U2U^l I'W^l 

1'{U2-U2){U2-U2)'1 
l'U2U2'll'U2U^l 

We use Assumption (A|2]) to obtain a bound on l'?72^2'l- Denote the projection of the 
normalized vector on the basis {uj} by pj = -^I'uj. We have that 



By assumption {-f^, < ^ 

(A^)^ Hence =iPj < ct- Since Ylij=iP] — 1, we have 

that 

K 



E P,^ = ^l'W2'l>l-a. 

Similarly, we obtain a bound on l'f/2^^2-'-- 

I'C/aC/al > ||(/ + Q'Q)"^/^?7^lf -2|l';7ig(/ + Q'Q)-iC/^l| 



(13) 



> (1 



6e. 



.6e 



6e. 



A'(l-a)-2i^^(l + ^)2(l-a)i/2 
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(1-a 



,1/2 



where we used e < Since by assumption e < 



Ad(T)V(l-») 



Ad+i(T) < 



Mil 

72 



and e < 



72 



, we obtain that e < 



72 

60 



, and using the facts that 
Hence, l'U2U^l > 



2 



16 



Finally, we obtain a bound on l'{U2 — f/2)(f^2 — f^2)'l- 

\\U2-U2\\^ = \\U2{I-{I + Q'Qy^/^) + UiQ{I + Q'Q)-^/^\\^ 

< \\U2U\1 - (I + Q'Qy'^\ + \\ui\\M\2W + 

where the last inequality follows from ([9|, the fact that for any eigenvector v of (I+Q'Q)^^/^ 
with eigenvalue A^, u is also eigenvector of / — (/ + Q'Q)~^^'^ with eigenvalue 1 — A„, and 



the fact that ||^||2 = 1 for every matrix A with orthonormal columns (see Golub and Loan 

6e / 6e\ ISKe 



1983). Consequently, 

ll(c,,_e,)'i||^<K|(2 + |)< 



5_ 
50 

Combining these results, we have 



where we used e < ^. 



I . II (13i^£)M 20e 



{K{l-a))/V2 Xd{T){l-a) 



where we used > ^. 



A.3 Proof of Theorem HH] 

Proof Since $(Z) = ^^^^ || Wij{zj — Zi)\\ , we bound each summand separately in 
order to obtain a global bound. 

Let the induced neighbors of Zi = f^^{xi) be defined by (ri, . . . , tk) = {f~^{i]i)i • ■ • > f'^iVK))- 
Note that a-priori, it is not clear that tj are neighbors of Zj. Let J be the Jacobian of the 
function / at Zj. Since / is a conformal mapping, J' J = c{zi)I, for some positive c : ^ M. 

Using first order approximation we have that rjj — Xi = J{Tj — Zi) + [\\Tj — Zi^'^ . Hence, 
for Wi we have, 

K K . 

Wij{Tj - ^j) = ^ WijJ'ijij - Xi) + { max ||t,- - Zi 

Thus we have 

K K K . . 

\\^Wij{Tj - Zi)\\ = \\^Wijj'{r]j - Xi)\\ + \\^ Wij J' {r]j - Xi)\\0 (max \\Tj - Zi\\ j. 
j=i j=i j=i V ^ / 

(14) 

We bound || X^j=i WijJ'{'nj~^i)\\ for the vector Wi that minimizes ([s]). Note that by (|4]), 
Ylf=i '^ijJ'i'Hj ~ ^i) = ''^i^t J + ^'^^72^2^2 J. However, by construction w[Xf = 0. Hence 

^ II II A* 

\\Y,WijJ\r|j-x^)\\ = \\w'iU2L2ViJ\\ < \\wi\\\\U2L2V2'j\\^ < " '^^^ 



\/c(zi) 
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where we used the facts that ||Aa;||2 < ||^||2||a;||2 for a any matrix A, and that \\A\\2 = 1 for 
a matrix A with orthonormal columns (see Section 2 of Golub and Loan 1983, for both). 



Substituting in (14), we obtain 



K 

E 

i=i 



c{zi) 



' Wij{Tj - Zi)\\'" < " + ll^i'ill^d+lC ( 

Since assumption (Aj2j) hold, it follows from (13) that 



max Ti 



< 



~ l'U2U2'l ^ K{l-a)- 

As / is an conformal mapping, we have Cmin||7"j — ^ dMiVji^i)^ where d-M is the 
geodesic metric and Cjain > is the minimum of the scale function c{z) that measures the 
scaling change of / at z . The minimum Cmin is attained as O is compact. The last inequality 
holds true since the geodesic distance dMiVji^i) is equal to the integral over c(z) for some 
path between Tj and Zi. 

The sample is assumed to be dense, hence ||rj — Xi\\ < sq, where sq is the minimum 
branch separation (see Section [5]). Using Bernstein et al. (2000), Lemma 3, we conclude 
that 

1 . , . vr M 



ZiW < 



-dMiVj,Xi) < 



2c, 



Since assumption (A[T]) holds, and 



r{{)' 



max \ \r]j 
j 



Xif > 



1 ^ 



\^i\\F 



we have A^+i <^ r{i). Hence || Yl!j=i 
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